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Abstract. The stable principal component pursuit (SPCP) problem is a non-smooth convex optimization problem, the 
solution of which has been shown both in theory and in practice to enable one to recover the low rank and sparse components 
of a matrix whose elements have been corrupted by Gaussian noise. In this paper, we first show how several existing fast 
first-order methods can be applied to this problem very efficiently. Specifically, we show that the subproblems that arise 
when applying optimal gradient methods of Nesterov, alternating linearization methods and alternating direction augmented 
Lagrangian methods to the SPCP problem either have closed-form solutions or have solutions that can be obtained with very 
modest effort. Later, we develop a new first order algorithm, NSA, based on partial variable splitting. All but one of the methods 
analyzed require at least one of the non-smooth terms in the objective function to be smoothed and obtain an e-optimal solution 
to the SPCP problem in 0(l/e) iterations. NSA, which works directly with the fully non-smooth objective function, is proved to 
be convergent under mild conditions on the sequence of parameters it uses. Our preliminary computational tests show that the 
latter method, NSA, although its complexity is not known, is the fastest among the four algorithms described and substantially 
outperforms ASALM, the only existing method for the SPCP problem. To best of our knowledge, an algorithm for the SPCP 
problem that has 0{l/e) iteration complexity and has a per iteration complexity equal to that of a singular value decomposition 
is given for the first time. 



1. Introduction. In ^ [T^], it was shown that when the data matrix D e ]gmxn .j-j^g fomi 

D = X'^ + 5*°, where is a low-rank matrix, i.e. rank(X°) <C min{m, n}, and 5" is a sparse matrix, i.e. 
|jS'''||o ^ mn (||.|jo counts the number of nonzero elements of its argument), one can recover the low-rank 
and sparse components of D by solving the principal component pursuit problem 

min \\X\\, + - X\\i, (1.1) 



where ^ = . = . 

\/max{m,n} 

For X G M™^", \\X\\^, denotes the nuclear norm of X, which is equal to the sum of its singular values, 
ll^lli \Xij\^ W^Woo := inax{\Xij\ : l<i<m, l<j<n} and ||X||2 := (Tm^^iX), where 

''■max(^) is the maximum singular value of X. 

To be more precise, let X° G E™^" with rank(XO) = r and let X° = UT.V'^ = X^Li crmivf denote the 
singular value decomposition (SVD) of X^. Suppose that for some fi > 0, U and V satisfy 

max\\U^e,\\l<f^, max||l/^e,||^<^, \IUV^\\^ < fj^ , (1.2) 
i mi n V 

where e,; denotes the i-th unit vector. 



Theorem 1.1. Suppose D = X" + 5°, where X^ e jjmxn ^-^j^ m < n satisfies (1.2) for some 



/i > 0, and the support set of S*" is uniformly distributed. Then there are constants c, Pr, Ps such that with 



probability of at least 1 — cn , the principal component pursuit problem ( 1.1 1 exactly recovers X^ and 
provided that 

rank(X°) < prTO/x"^(log(n))"^ and \\S"\\q < psmn. (1.3) 

In |13| . it is shown that the recovery is still possible even when the data matrix, D, is corrupted with a dense 
error matrix, such that ||C'^||f < 5, by solving the stable principal component pursuit (SPCP) problem 

(P): min + ^ ||5|| i : \\X + S-D\\f<S}. (1.4) 

Specifically, the following theorem is proved in [13]. 
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Theorem 1.2. fl^ Suppose D = + + C°, where X° € M"^" with m < n satisfies Ol) for 



some /i > 0, and the support set of is uniformly distributed. If X^ and satisfy (1.3), then for any 
such that IIC'^IIf < S the solution, {X* , S*), to the stable principal component pursuit problem ( 1.4) satisfies 
\\X* — X'^Wp + \\S* — S'^Wp < CmnS'^ for some constant C with high probability. 

Principal component pursuit and stable principal component pursuit both have applications in video 
surveillance and face recognition. For existing algorithmic approaches to solving principal component pursuit 
see 13 El [3 US] and references therein. In this paper, we develop four different fast first-order algorithms 
to solve the SPCP problem (P). The first two algorithms are direct applications of Nesterov's optimal 
algorithm [9] and the proximal gradient method of Tseng [11], which is inspired by both FISTA and Nesterov's 
infinite memory algorithms that are introduced in [T] and [5], respectively. In this paper it is shown that 
both algorithms can compute an e-optimal, feasible solution to (P) in 0{l/e) iterations. The third and 
fourth algorithms apply an alternating direction augmented Lagrangian approach to an equivalent problem 
obtained by partial variable splitting. The third algorithm can compute an e-optimal, feasible solution to 
the problem in 0{l/e^) iterations, which can be easily improved to 0{l/e) complexity. Given e > 0, all first 
three algorithms use suitably smooth versions of at least one of the norms in the objective function. The 
fourth algorithm (NSA) works directly with the original non-smooth objective function and can be shown to 
converge to an optimal solution of (P), provided that a mild condition on the increasing sequence of penalty 
multipliers holds. To best of our knowledge, an algorithm for the SPCP problem that has 0{l/e) iteration 
complexity and has a per iteration complexity equal to that of a singular value decomposition is given for 
the first time. 

The only algorithm that we know of that has been designed to solve the SPCP problem (P) is the 
algorithm ASALM [TU]. The results of our numerical experiments comparing NSA algorithm with ASALM 
has shown that NSA is faster and also more robust to changes in problem parameters. 

2. Proximal Gradient Algorithm with Smooth Objective Function. In this section we show 
that Nesterov's optimal algorithm [SJIl] for simple sets is efhcient for solving (P). 

For fixed parameters /i > and > 0, define the smooth C^'^ functions and gi,{.) as follows 

/^(^) = ,,„.^^f,„,,<^'^)-fll^ll- (2-1) 

^^(^)-^,„™.-f^„^<,<^'^>-iii^ii- (2.2) 

Clearly, /^(.) and .g,y(.) closely approximate the non-smooth functions f{X) := \\X\\^ and g{S) := \\S\\i, 
respectively. Also let x {{X, S) e M™^" x M™^" : \\X + S - D\\f < S} and L = i + ^, where i and 
i are the Lipschitz constants for the gradients of ff^{.) and g,y{.), respectively. Then Nesterov's optimal 
algorithm [8, 9 for simple sets applied to the problem: 

min {f,,iX) + U,{S): iX,S)ex}, (2.3) 

is given by Algorithm [l] 

Because of the simple form of the set x, it is easy to ensure that all iterates {Y^,Y^), {Z^,Zf.) and 
(ATfe+i, S'fc+i) lie in x- Hence, Algorithm [l] enjoys the full convergence rate of 0{L/k'^) of the Nesterov's 
method. Thus, setting /i = f2(e) and — fl{e), Algorithm [l] computes an e-optimal and feasible solution 
to problem (P) in k* — 0(l/e) iterations. The iterates {Y^,Yi^) and {Z^,Z^) that need to be computed at 
each iteration of Algorithm [l] are solutions to an optimization problem of the form: 

(Ps): min { ^ (\\X - XWj, + \\S - S\\l-) + {Q,, X) + {Q^, S) : {X,S)ex]- (2.4) 

The following lemma shows that the solution to problems of the form (P^) can be computed efficiently. 
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Algorithm 1 SMOOTH PROXIMAL GRADIENT ( Aq, 5o) 

1: input: Xo G R™''", So G 

2: fc ^ 

3: while k < k* do 

4: Compute \/f^(Xk) and Vg^iSk) 

5: argmin;,_s {{V/^(Xfe),X) + (Vp.(5fe), S) + f - Xk\\% + \\S - Sk\\l) : {X,S) G x} 

6: rfe(x, 5) := Eto ¥ {(v/m(^0, ^) + i^gASi), s}} 

7: {Z%, ZD ^ argmin^_s {Tk{X, S) + f {\\X - Xo\\l + \\S- So\\l) : (X, S) G x} 

8: ^ (|±|) (n^ Y,») + (^3) {Z%,Zi) 

9: fc fc + 1 
10: end while 
11: return {X^'^Sk*) 



Lemma 2.1. The optimal solution {X*,S*) to problem {Pg) can be written in closed form as follows. 
When S>0, 

where qx{X) := X — Q^, qs{S) S" — j Qs and 

r ^ max jo, § [^M^)±M^)^ _ 1 j I . (2.7) 

Mien (5 = 0, 

A* = i (i?-g,(5)) +i g,(A) and 5* = i (i? - g,(A)) + i (2.8) 



Proof. Suppose that 5 > 0. Writmg the constraint m problem (Pg), (A, S) £ x, as 



^||A + 5-I?|||,<^, 



(2.9) 



the Lagrangian function for (2.4 1 is given as 

£(A, S;e) = ^ (||A - X\\% + \\S- S\\l) + {Q,,X - X) + {Q,,S -~S) + \ (||A + 5 - - 5^) 



Therefore, the optimal solution (A*, S*) and optimal Lagrangian multiplier 0* G M must satisfy the Karush- 
Kuhn- Tucker (KKT) conditions: 
i. ||A* +S'* -Z^lli. < 5, 

u. e* > 0, 

iii. e* {\\x* + s* -d\\f-s) = o, 

iv. L{X* - A) + 9*{X* +S* -D) + Qx = 0, 
V. Lis* ~-S) + 0*{X* +S* -D) + Q,=0. 

Conditions pv| and [v| imply that {X*,S*) satisfy (2.5) and (2.6), from which it follows that 



X* +S* -D 



1 + 29* 



(qx{X) + q,{S) 



D 



(2.10) 



Case 1: Wq^X) + qs{S) - D\\f < S. Setting X* = S* = qs{S) and 0* = 0, clearly satisfies 

(2.5), (2.6) and conditions |i] (from ( 2.10[ )), pHandpIi] Thus, this choice of variables satisfies all the five KKT 
conditions. 

Case 2: \\q.,iX) + q,{S)~ D\\f > S. Set 9* = f (^ \\<iAX)+<iAS)-d\\^ _ since Wq^X) +qsiS) - D\\f > 

S, 0* > 0; hence, |ii]is satisfied. Moreover, for this value of 0* , it follows from (2.10) that \\X* +S* — D\\f ~ S. 
Thus, KKT conditions p] and pE] are satisfied. 

Therefore, setting X* and S* according to (2.5) and (2.6), respectively; and setting 

satisfies all the five KKT conditions. 

Now, suppose that S = 0. Since S* ^ D — X* , problem (Ps)can be written as 

min^eK^x^ \\X - X + + \\D - X - S + %|||,, 



which is also equivalent to the problem: minxeR^xn \\X — qx{X)\\'j^ + \\X ~ {D ~ qs{S))\\j^. Then (2.8) 
trivially follows from first-order optimality conditions for this problem and the fact that S* = D ~ X* . □ 

3. Proximal Gradient Algorithm with Partially Smooth Objective Function. In this section 
we show how the proximal gradient algorithm. Algorithm 3 in llj, can be applied to the problem 

^ niin^J^(A)+e ll^lli : {X,S)&x}, (3.1) 



where /p(.) is the smooth function defined in ([2J| such that V/p(.) is Lipschitz continuous with constant 
= This algorithm is given in Algorithm 2 



Algorithm 2 PARTIALLY SMOOTH PROXIMAL GRADIENT ( Aq, S'o) 

1: input: Xo G R'"'<", So G R""" 
2: (^o",^o)^(-^o,So),fc^O 
3: while k < k* do 

4: (y,^F,=) ^ (^) (x,,Sfe) + (4^) {z^,,zi) 

5: Compute V/p(Y;f ) 

6: {Z^,+„ZU,) ^ argmin^^s {Etc, ¥ {ell^lli + (V/^(rf + - X^l : {X, S) G x} 

7: {Xk+i,Sk+i) ^ ( fcq^) (Xk, Sk) + ( fc^) iZk+i,Zl^i) 
8: fc ^ fc + 1 
9: end while 
10: return {Xk',Sk') 



Mimicking the proof in ^IJLj, it is easy to show that Algorithm [2j which uses the prox function i||A — 
Ao|||,, converges to the optimal solution of ( |3.1| ). Given {Xq,So) € x, e.g. Xq = and Sq = D, the 
current algorithm keeps all iterates in x as in Algorithm [ij and hence it enjoys the full convergence rate 
of 0{L/k'^). Thus, setting /i = ri(e), Algorithm [2] computes an e-optimal, feasible solution of problem (P) 
in k* = 0{\/e) iterations. 

The only thing left to be shown is that the optimization subproblems in Algorithm [2] can be solved 
efficiently. The subproblem that has to be solved at each iteration to compute [Zfj^^, ^k+i) ^^e form: 

(P„,): min{e||5||i + (Q,A-A) + ^|lA-A|j| : (A, 5) G x} , (3.2) 



for some p > 0. Lemma 3.1 shows that these computations can be done efficiently. 
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Lemma 3.1. The optimal solution (X*,S*) to problem (P„s) can be written in closed form as follows. 
When S>0, 

S* = sign (i^ - Q max I - g(X) I - g ^^^f ^ E, o| , (3.3) 

^* = (D S*) + qiX), (3.4) 

p + o* p + 0* 

where q{X) := X — ^ Q, E and G jjmxn matrices with all components equal to ones and zeros, 

respectively, andQ denotes the componentwise multiplication operator. 9* = if\\D~q{X)\\F < d; otherwise, 
9* is the unique positive solution of the nonlinear equation (t>{9) = S, where 

,/>(0):=||min|| i?, \D - q{X)\^ (3.5) 

Moreover, 9* can he efficiently computed in 0{mn\og{mn)) time. 
When (5 = 0, 

S* ^ sign (^D - q{X)^ Qina.yii^\D - q{X)\- ^ E, and X* = D - S* . (3.6) 



Proof. Suppose that 5 > 0. Let {X*,S*) be an optimal solution to problem (P„s) and 9* denote the 
optimal Lagrangian multiplier for the constraint {X,S) G x written as (2.9 1. Then the KKT optimality 
conditions for this problem are 

i. Q + p{X* - X) + 9*{X* +S* -D) = G, 

ii. iG + 9*{X* +S* -D)^QaxidG&d\\S*\\i, 

iii. \\X* + S* - D\\f < S, 

iv. 9* > 0, 

V. 9* {\\X* +S* ~D\\f-S)^0. 
From [I] and [n] we have 



(3.7) 



{p + 9*)I 9*1 ' 




■ ^* ■ 




■ 9*D + pq{X) ' 


9*1 9*1 




S* 




9*D-£_G 



where q{X) ^ X - ^ Q. From it follows that 



ip + 9*)I 9*1 




■ X* ' 




« (^) ^ . 




s* 





9*D + pq{X) 
^ iD~qiX))-^G 



From the second equation in (3.81, we have 



G + S* +q{X)-D = 0. 



But (3.9) is precisely the first-order optimality conditions for the "shrinkage" problem 

.{P 



mm 



i''^^\\S\\i + l\\S + qiX)-D\ 



(3.8) 



(3.9) 



Thus, S* is the optimal solution to the "shrinkage" problem and is given by (3.3). (3.4) follows from the 

S*-D=^— {S*+qiX)-D). (3.10) 



first equation in (3.8), and it implies 

X* 



Therefore, 

\\X*+S*-D\\f = 



p 


+ e* 




p 


p 


+ e* 




p 


p 


+ e* 




p 



\S* +q{X)-D\\F, 

\sign (d - qiX)) Q max (|Z? - q{X)\ - g ^^^f^ E,0}-{D- q{X) ) ||;^, 



-\D-q{X)\ 



\maxUD-q{X)\-^^-^^ E, 



minU^^^^^ E, \D-qiX)\}\\F, 



P 



\D-q{X)\ 



where the second equation uses (3.3 1. Now let 



m 



be 



E, \D-q{X)\ 



(3.11) 



(3.12) 



Case 1: \\D - q{X)\\F <S. 0* = 0, S* = and X* = q{X) trivially satisfy all the KKT conditions. 

Case 2: \\D — q{X)\\p > 6. It is easy to show that (f){.) is a strictly decreasing function of 0. Since 
0(0) = \\D - q{X)\\F > S and linie_>oo 4i{d) = 0, th ere e xists a un ique 9* > such that (j){9*) = 6. Given 6*, 
S* and X* can then be computed from equations (3.3 1 and (3.4), respectively. Moreover, since 6* > and 
(j){9*) = S, ( |3.11[ ) imphes that X* , S* and 9* satisfy the KKT conditions. 

We now show that 9* can be computed in 0{mnlog{mn)) time. Let A := \D — <;(-'^)| and < a(i) < 
ti(2) ^ ••■ ^ fl(mn) be the mn elements of the matrix A sorted in increasing order, which can be done in 



0{mnlog(mn)) time. Defining a(Q) 
P 



and a 



(mn+l) 



oo, we then have for all j e {0, 1, mn} that 



p 1 



1 1 

- < - < - a 



p + u --' t/ p + t/ -- ■ ' t, p U ' p 

For all k < j < mn define 9j such that j- = ^ a^j) — ^ and let k := max|j : ^ < 0, j € {0, 1, 
Then for all k < j < mn 



,) 



i=0 



(3.13) 
mrt}|. 

(3.14) 



Also define 9^ := oo and 9mn+i '■— so that (/'(6'j.) := and ^(fi'mn+i) = (/"(O) — \\A\\f > 5- Note that 
{^j}{k<j<mn} contains all the points at which (t>{9) may not be differentiable for 9 > 0. Define j* :— max{j : 
0(^j) l£ S, k < j < mn}. Then 9* is the unique solution of the system 



2 J* 



51 + (mn-j*) 



i=0 



6 and 9 > 0, 



(3.15) 



since 0(^) is continuous and strictly decreasing in 9 for > 0. Solving the equation in (3.151 requires finding 
the roots of a fourth-order polynomial (a.k.a. quartic function); therefore, one can compute 9* > using 
the algebraic solutions of quartic equations (as shown by Lodovico Ferrari in 1540), which requires 0(1) 
operations. 

Note that if fc = mn, then 9* is the solution of the equation 



2 run 



(3.16) 
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-1 = p 



\D~X\\ 



— 1). Hence, we have proved that problem (P„s) can be solved 



i.e. e* = 

efficiently. 

Now, suppose that 5 — 0. Since S* = D — X* , problem (P„s) can be written as 



i\\Sh + ^\\S-iD-q{X))\\%. 



(3.17) 



Then (3.6) trivially follows from first-order optimality conditions for the above problem and the fact that 
X* = D - s*. n 

The following lemma will be used later in Section [sj However, we give its proof here, since it uses some 
equations from the proof of Lemma 



3.1 



Let Ixi-, •) denote the indicator function of the closed convex set 
X C K'"^" X JR""^", i.e. if (Z, S) e xT^en l^iZ, S) = 0; otherwise, l^iZ, S) = oo. 

Lemma 3.2. Suppose that S > 0. Let {X*,S*) be an optimal solution to problem (P„s) and 6* be an 
optimal L aqran qian multiplier such that (X*, S*) and 9* together satisfy the KKT conditions, ^^in the proof 
of Lemma 



3.1 



^ ^ Then {W*,W*) € dl^iX^S*), where W* := -Q + p{X - X*) = 9*{X* +S* - D). 

Proof Let W* := + p{X - X*), then from | and |v] of the KKT optimality conditions in the proof 
of Lemma [O we have W* = e*{X* + S* - D) and 



\W* 



e*\\x* 



s* -D\\ 



e*(\\x* 



D\\ 



5)- 



(3.18) 



Moreover, for all (X, S) e x, it follows from the definition of x that {W*,e*{X + S - D)) < 6'*||VK*||f||^ + 
S-D\\f < 9*S\\W*\\f. Thus, for all (X,5) e x, we have {W*,W*) = ||W^*|1| = 9*S\\W*\\f > {W*,9*{X + 
S-D)). Hence, 



> {W*,e*{X + S-D)-W*) = {W*,9*{X - X* + S - S*)) V (X,S) e X- 



(3.19) 



It follows from the proof of Lemma 3.1 that if \\D — q{X)\\F > 6, then 9* > 0, where q{X) — X — ^Q. 
Therefore, (3.19) implies that 



o>{w*,x-x* + s-s*) y{x,s)ex- 



(3.20) 



On the other hand, i f \\D ~ q{X)\\F < <5, then 9* = 0. Hence W* = 9*{X* + S* - D) ^ 0, and ( |3^ follows 
trivially. Therefore, always holds and this shows that {W*,W*) G dlx{X*,S*). □ 

4. Alternating Linearization and Augmented Lagrangian Algorithms. In this and the next 
section we present algorithms for solving problems (3.1) and (1.4) that are based on partial variable splitting 
combined with alternating minimization of a suitably linearized augmented Lagrangian function. We can 

(4.1) 



write problems (1.4 1 and (3.1) generically as 



min {(l^iX) + C 9iS) : {X,S)ex}- 



For problem ([O]), (/)(X) = f{X) = while for problem (/)(X) = f^,{X) given in ( plj ). 

In this section, we first assume that assume that : M and g : E™^" x M™^" — ^ M are any 

closed convex functions such that V(/> is Lipschitz continuous, and x is a general closed convex set. Here we 
use partial variable splitting, i.e. we only split the X variables in (4.1 1, to arrive at the following equivalent 
problem 



mm 



{^{X) + C giS) : X^Z, {Z,S)ex}- 



Let ipi^i '3) ■— ? ff('5') + lx(-^' ^) ^^'^ define the augmented Lagrangian function 

CpiX, Z, S; Y) - <j){X) + yj{Z, S) + {Y, X - Z) + ^\\X - Z\\l. 



(4.2) 



(4.3) 



Then minimizing (4.3 1 by alternating between X and then {Z, S) leads to several possible methods that can 
compute a solution to (4.2). These include the alternating linearization method (ALM) with skipping step 
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Algorithm 3 ALM-S(ro) 



1: input: Xo G R"''", So G R'"^", Yq G R""""" 
2: Zo ^ Xo, k 
3: while fc > do 

4: Xk+i argmin^ 'Cp(X, Zk,Sk;Yk) 

5: if .^(Xfc+O + VCXfc+i,^*;) >£p(Xfe+i,Zfe,Sfc;yfe) then 
6: Xfc+i 
7: end if 

8: {Zk+i,Sk+i) ^ argmin^g S) + (?!>(Xfc+i) + {\/(P{Xk+i), Z ~ Xk+i) + f^\\Z - Xk+ifp 

9: Ffc+i ^ -V<?!.(X'=+^) + p(Xfc+i - Zk+i) 
10: fc ^ fc + 1 
11: end while 



that has an ©(f^ 



convergence rate, and the fast version of this method with an ©( — 



) rate (see |S] for fuh 

In this paper, we only provide a proof of the complexity result for the 



splitting versions of these methods), 
alternating linearization method with skipping steps (ALM-S) in Theorem 4.1 below. One can easily extend 
the proof of Theorem 4.1 to an ALM method based on (4.3) with the function g{S) replaced by a suitably 
smoothed version (see 0| for the details of ALM algorithm). 

Theorem 4.1. Letc/): M™^" -^Randip: E^xn^j^rnxn ^ ^^oserf convex functions such thatW<j) is 
Lipschitz continuous with Lipschitz constant L, and x be a closed convex set. Let ^{X, S) :— (l){X)+ip{X, S). 
For p> L, the sequence {Z^, Sk}ke'L+ Algorithm ALM-S satisfies 



lAn - X* 



2 

Lf 



2{k + nk) 



(4.4) 



where {X*,S*) = argmin^ ^gR^x™ $(A, 5), nu := ELo^ l{*(x,+i,s.)>Cp(Xi+i,Zi,s,:Y,)} !{■} ^ 
argument is true; otherwise, 0. 

Proof. See Appendix |A] for the proof. □ 

We obtain Algorithm [4| by applying Algorithm [s] to solve problem (3.1), where the smooth function 
(t){X) ~ f^{X), defined in (2.1), the non-smooth closed convex function is ^ ll-SHi -I- 'Ly{X,S) and x = 
{{X,S) e E'"^" X M"><" : \\X + S - D\\f < 5}. Theorem O shows that Algorithm ffl has an iteration 



complexity of C'(^) to obtain e-optimal and feasible solution of (P) 



Algorithm 4 PARTIALLY SMOOTH ALM(yn) 

1: input: Yo G R™''" 

2: Zo ^ 0, So i- D, k ^ 

3: while fc > do 

4: Xk+i ^ argmin^ f^,{X) + {Yk,X- Zk) + ^\\X - Zk\\l 

5: Bk ^ f^Xk+i) +^\\Sk\\i + {Yk,Xk+i - Zk) + f ||Xfe+i -ZkWl 

6: if /p(Xfc+i) +5 \\Sk\\i + lx{Xk+i,Sk) > Bk then 

7: Xk+i Zk 

8: end if 

9: {Zk+i,Sk+i) ^ argmin^.sU \\S\U + {\/ f^{Xk+i), Z - Xk+i) + ^\\Z - Xk+i\\l : {Z,S) G x} 
10: Yk+i ^ -V/p(Xfc+i) + p{Xk+i - Zk+i) 
11: fc ^ fc + 1 

12: end while 



Using the fast version of Algorithm [sj a fast version of Algorithm [4] with 0{p/k^) convergence rate, 
employing partial splitting and alternating linearization, can be constructed. This fast version can compute 
an e-optimal and feasible solution to problem (P) in 0(l/e) iterations. Moreover, like the proximal gradient 
methods described earlier, each iteration for these methods can be computed efficiently. The subproblems 
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to be solved at each iteration of Algorithm [4] and its fast version have the following generic form: 

min f^{X) + {Q,X-X) + ^\\X-X\\l, (4.5) 
min {^\\S\\i + {Q,Z-Z) + ^\\Z-Z\\j,: iZ,S)ex}- (4.6) 

Let U diag((T)V^^ denote the singular value decomposition of the matrix X — Q/p, then X* , the mi nim izer of 
the subproblem in (4.5), can be easily computed as U diag ^cr — ^^-^^^^^J i+p^} ) ■ -^^'^ Lemma jsjj shows 



how to solve the subproblem in (4.6) 



5. Non-smooth Augmented Lagrangian Algorithm. Algorithm [s] is a Non-Smooth Augmented 
Lagrangian Algorithm (NSA) that solves the non-smooth problem (P). The subproblem in Step |4] of 
Algorithm [5] is a matrix shrinkage problem and can be solved efficiently by computing a singular value 
decomposition (SVD) of an to x n matrix; and Lemma 3.1 shows that the subproblem in Step [6] can also be 



solved efficiently. 



Algorithm 5 NSA(Zo,Fo) 

1: input: Zo e R™''", Yo £ R'"''" 

2: fc -S- 

3: while fc < do 

4: Xk+i ^ argmin^{||X||. + - Z,) + f ||X - Z^,\\l} 

5: Yfe+i ^Yk+ Pk{Xk+i - Zk) 

6: {Zk+i,Sk+i) argmin{(2^s)^||2+s_B||2.<52}{C||S'||i + {-Yk,Z-Xk+i} + f\\Z - Xk+i\\l} 

7: Let 6k be an optimal Lagrangian dual variable for the IHZ + S* — -DUl^ < ^ constraint 

8: Vfe+i ^Yk+ pk{Xk+i - Zk+i) 

9: Choose pk+i such that pk+i > Pk 
10: fc ^ fc -I- 1 

11: end while 



We now prove that Algorithm NSA converges under fairly mild conditions on the sequence {p/c}fcez+ 
of penalty parameters. We first need the following lemma, which extends the similar result given in 6] to 
partial splitting of variables. 

Lemma 5.1. Suppose that 6 > 0. Let {Xi^, Zi^, Sk,Y^., 9h}kez,_i_ be the sequence produced by Algo- 
rithm NSA. {X*,X*,S*) = fiTgmmx^z,s{\\X\\* + ^ ■ ^\\Z + S - D\\l < f , X = Z} be any optimal 
solution, Y* G jj™x" Q^j^fj^ 0* > be any optimal Lagrangian duals corresponding to the constraints X = Z 
and + 5 — D\\^ < respectively. Then {\\Zk — X*\\'jp + p^'^\\Yk — i^*|||-}feez+ is a non-increasing 
sequence and 

Efcez+ ll^fe+i - ZkWp < oo Efcez+ Pk^\\Yk+i - Yk\\j, < oo, 

Ekez^Pk'{~Yk+i + Y*,Sk+i-S*) <oo Y.kez^P~k\-Yk+i + Y\Xk+i-X*) 

T.k<,z^ Pk'iY* - Yk+i,X* +S*- Zk+i - Sk+i) < ^. 
Proof. See Appendix |B] for the prooL □ 

Given partially split SPCP problem, minx,z,s{|l^ll* + ^ll^'Hi : X = Z, {Z,S) G x}, let C be its 
Lagrangian function 

C{X,Z,S;Y,6) = \\X\\,+^ + {Y,X - Z) + ^-{\\Z + S - D\\l - 5^) . (5.1) 

Theorem 5.2. Suppose that 5 > Q. Let {Xk, Zk, Sk,Yk,9k}keZ-f- be the sequence produced by Algo- 
rithm NSA. Choose {pk}kei.+ such that 
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(^) Hkdi^ ^ ^^^"^ liuikez+Zk = limkez+Xk = X* , limkez+ Sk = S* such that {X*,S*) = 

argmin{||X||, + C US'!!! : \\X + S - D\\p < 6}. 
ft'i') J2k€i,+ ^ ^ V ^ X*\\f 7^ 5, then limfcgz+ ^'fe = 6** > and \imkez+Yk = Y* such that 



{X* ,X* ,S* ^9*) is a saddle point of the Lagrangian function C in (5.1|. Otherwise, if \\D — 
X*\\p — S, then there exists a limit point, {Y*,0*), of the sequence {Yk,9k}kez+ such that {Y*,6*) = 
argmaxye{/:(X*,X*,S'*;r,6l) : 9>0}. 
Remark 5.1. Requiring X)fcGZ+ pST ^ ^'^ ■'''^ij/ar to the condition in Theorem 2 in J^, which is needed 
to show that Algorithm I-ALM converges to an optimal solution of the robust PGA problem. 

Remark 5.2. Let D ^ X° + S° + C° such that \\C°\\f < S and {X°,S°) satisfies the assumptions of 
Theorem 1.2 //||S'''||f > ^/Cmn5, then with very high probability , \\D — X*\\p > S, where C is the numerical 
constant defined in Theorem 1.2 Therefore, most of the time in applications, one does not encounter the 
case where \\D — X* \\f — S. 



Proof. From Lemma 5.1 and the fact that X^^i — Z^^i = (Yk+i — Y^) for aU fc > 1, we have 



oo> ^ p^^Yk+i-YkWl^ ^ \\Xk+i - Zk+i\\l. 



fceZj. fees 



Hence, hmfcgz+(-'^fe - Zk) = 0. 

Let iX*,X*,S*) = argmin^_2_<j{||X||, +C \\S\\i : l\\Z + S D\\l < ^, X = Z} he any optimal 
solution, e jjmxn 0# > Q he any optimal Lagrangian duals corresponding to X = Z and ^\\Z + S — 

D\\% < ^ constraints, respectively and /* := \\X*\\^ +^ ll'S'^lli- 

Moreover, let x = {{Z,S) G K™^" x M™^" : \\Z + S - D\\f < 6} and l^iZ^S) denote the indicator 
function of the closed convex set Xj i-e. lx{Z,S) = if {Z,S) £ xi otherwise, lx{Z,S) = oo. Since the 
sequence {{Zk, Sk)}kGi,+ produced by NSA is a feasible sequence for the set x, we have lx{Zk,Sk) = for 
all fc > 1. Hence, the following inequality is true for all fc > 

ll^fell*+e ii^feiii 
= \\Sk\\i + ix{Zk,Sk), 

< +e \\S*\\i + lx{X*,S*) - {-n,X* - Xk) - {-Yk,S* - Sk) - {Yk,X* + S*-Zk- Sk), 

= f* + {-Yk + Y*,Xk-X*) + {-Yk + Y*,Sk-S*) + {Y*-Yk,X* + S*-Zk-Sk) (5.2) 
+ {Y*,Zk-Xk), 

where the inequality follows from the convexity of norms and the fact that —Y^ € ^ ^IjS'fclli, — Yfe G 
i9||Xfc|j* and (Y/cY/j) e S'fc); the final equality follows from rearranging the terms and the fact 

that {X*,S*) e X- 

From Lemma 15. 11 wc have 



J2 Pk\ ({-Yk + Y*,Xk-X*) + {-Yk+Y*,Sk-S*) + {Y*-Yk,X* + S*-Zk-Sk)) < oo 



feGZ+ 

Since X^feez ~ ~ there exists JC C 2+ such that 



hm ({-n + Y*,Xk - X*) + {-Yk + Y*, Sk - S*) + {Y* - Yk,X* + S*-Zk- Sk)) = 0. (5.3) 



(5.3 ) and the fact that limj.gz+ Z^. — X^ — imply that along JC (5.2 ) converges to /* = +^ i = 

min{||X||* +^ IjS'lli : {X,S) G x}; hence along /C subsequence, +^ ll^fellilfegyc is a bounded sequence. 

Therefore, there exists /C* C /C C Z+ such that lim^g^c* {Xk,Sk) = {X* ,S*). Also, since \iuik£Z+ Zk—Xk = 
and {Zk, Sk) € X for all fc > 1, we also have {X* , S*) = limfcgjc* {Zk, Sk) £ X- Since the limit of both sides of 



(5.2) along /C* gives ||A*||* +C ||5*||i = lim^eK;- \\Xk\U + ^ \\Sk\\i < f* and {X*,S*) e x, we conclude that 



{X*,S*) = argmin{||A||, +e ||5||i : {X,S) G x}- 
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It is also true that {X* ,X* , S*) is an optimal solution to an equivalent problem: argmin^^ ^ sill^ll 



and 



Z and 

that {\\Zk - X*\\j, + p^^\\Yk - Y\\l} 



corresponding to X 

\F^Pk' 



X = Z}. Now, let Y e 
fegz+ is a bounded non-increasing sequence. Hence, it has a unique limit 



^IIf — T constraints, respectively. From Lemma 



> be optimal Lagrangian duals 
it follows 



5.1 



pomt. I.e. 



lim \\Zk-X*\\i= lim WZk-X' 

fcGZ+ 



2 



Pk 



|n.-r||^= lim \\Zk-X' 
keK' 



2 
F 



where the equalities follow from the facts that lim^g^* Z^ — X* , fj,k 



are bounded sequences, lim^g^ 



X*\\f = and limfegz+ Zk 



oo as fc ^- oo and {Yk}kez+, {Yk}k€i.+ 
Xk ^ imply that limfe£z+ Xk = X* . 



Using Lemma 3J_ for the fc-th subproblem given in Step |6] in Algorithm [5] we have 



Sk+i = sign [ D - [ Xk+i + — Ife 



max 



D- X, 



^k+l 



Yu 



Pk 



Pk 

\i\\D-{Xk+xA 
where 



6k 



Pk 



Pk 



Xk+i H Yk 

Pk 



TT ^fe)llF < then 9k = 0; otherwise, 6'^ > is the unique solution such that 

pk ' ' 



MO) ■■= 



E, 



Pk 



Pk 



D- X 



^k+l 



Yk 



Pk 



(5.4) 
(5.5) 

%) = 5, 

(5.6) 



In the following, it is shown that the sequence {Sk}k&jf. has a unique limit point S* . Since limfegz+ Xk = X* 



{Yk} 



k&+ 



is a bounded sequence and pk oo as, k ^ oo, we have limfeg^ , X 



k+l 



Pk 



Yk^X* 



Case 1: \\D — X*\\p < S. Previously, we have shown that that exists a subsequence /C* C such that 
limkeic' {Xk,Sk) = {X*,S*) = argmin^- +CII'S'||i : \\X + S - D\\f < S}. On the other hand, since 
ll^* - X*\\f < 6, {X*,0) is a feasible solution. Hence, \\X*\\^ +C\\S*\\ < \\X*\\^, which implies that S* = 0. 



< 



\Xk\U+C WSkWi 

\Xk\U+C \\Sk\\i + i^{Zk,Sk), 

\X*\U+C ||0||i + 1^{X*,0) - {-Yk,X* -Xk) - {-Yk,0-Sk) - {Yk,X* 
\X*\U + {Yk,X* - Xk) + {Yk,Zk - X*). 



0-Zk-Sk), 



(5.7) 



Since the sequences {Yk}kei,+ and {Yk}keZ-^ are bounded and lim^gz^ Xk = limfegz^ Zk = X* , taking the 
limit on both sides of (5.7), we have 



\\X* 



^ lim ||5fe||i= lim \\Xk\U+C \\Sk\\i 
kei+ ke'i+ 



= lim \\X*\\, + {Yk,X* -Xk) + {Yk,Zk-X*) = \\X*\\ 



Therefore, lim^gz^ 
Case 2: \\D - 



Sk\\i — 0, which implies that linifeg^ Sk — S* 



X*\\f > S. Since \\D - {Xk+i 



Yk) 



0. 

\D 



X*\\f > 5, there exists K e 1^+ 



such that for all k > K, \\D — {Xk+i + Yk)\\F > S. For all k > K, (j)k{-) is a continuous and strictly 
decreasing function of 6 for 9 > 0. Hence, inverse function (f>k^{-) exits around S for all k > K. Thus, 
</>fc(0) = IID - {Xk+i + ^ Yk)\\F > S and lim^^oo M9) = imply that 0k = 0^ '(<5) > for all k > K. 
Moreover, (j)ki0) < (j){6) := s-H-^ \^r.v.^. fT,of a, <r SV™" 

bounded sequence, which has a convergent subsequence Ke C Z+ such that YmikeKB ^k 
(t)k{0) 4'ooi0) pointwise for all < 6* < S^/I™^ where 



E\\f implies that 0k < for all k > K. Therefore, {0k}keT.+ is a 

We also have 



E, \D~X* 



(5.8) 
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Since 4)k{6k) = S for all k > K, we have 



6 = lim (l)k{Ok) 

keK. 



1 E 

h ' Pk + Ok 



D-[X, 



^k + l 



1 

Pk 



Yk 



(5.9) 



Note that since \\D — X*\\p > S, cfi^o is invertible around 5, i.e. (f)^ exists around 5. Thus, 9* — (t)7^{5) 
Since ICg is an arbitrary subsequence, we can conclude that 6* :— linifegz+ Ok 
6** > such that 9* — limfegz+ ^fe, taking the limit on both sides of (5.4 1, we have 



lim 5fe+i 



sign {D - X*) max <\D~X 



:— limfegz+ Ok = <t'ooi^)- Since there exists 



(5.10) 



and this completes the first part of the theorem. 

Now, we will show that if \\D — X*\\p ^ S, then the sequences {9k}kez+ and {Yk}kez+ have unique 
limits. Note that from (B.3|, it follows that Yk — Ok-i{Zk + Sk — D) for all fc > 1. First suppose that 

^ Yk)\\F -> \\D-X*\\f < S, there exists if e Z+ such that for all fc > K, 
for all k>K,ek 



\D-X*\\f < S. Since \\D-{Xk+i- 



Pk 



\D- {Xk+i + -j^ Yk)\\F < S. Thus, from Lemma 



which implies that 9* := 
S* = limfcgK;* Sk = lim^g^ 



Hmfc 



and 



3.1 



0, S, 



k+l 



0, 



k+l 



X. 



k+l 



limfegz+ Yk = limfegz+ Ok-i{Zk + Sk - D) 



— ^ Yk , 
Pk 

since 



Sk = 0, \imkez+ Zk = X* and ||D - X*\\f < 6. Now suppose that \\D- X*\\f > 



S. In Case 2 above we have shown that 6* = limfegz^ Ok- Hence, there exists Y* G 
Y* - limfcgz+ Ok-iiZk + Sk-D)= 9*{X* + S* - D). 

Suppose that '^kez ~^ ~ From Lemma 5.1 we have X^fcg 

Pk 

series can be written as 



such that 



Zk+i — Zk\W < oo. Equivalently, the 



oo > 



/cez+ 



-'k+l 



'■'kllF 



E 

fcez+ 



Pk' 



\Yk 



k+l 



Yu 



(5.11) 



Since X^feez ^ ~ there exists a subsequence JC C such that \m\k£K: ll^fc+i ^ ^fc+ill|^ = 0. Hence, 



VrnikeK pjW Zk+i -Zkllp = , i-C- MnikeK PkiZk+i 
Using ([b1|), (|R2|) and (fRSl), we have 



Zk) = 0. 



e d\\Xk+i\U + 9k(Zk+i + 5fe+i ~D)+ pk{Zk+i - Zk), 
e + ^^fe(^fe+i + - D). 



If II D - X*|| ^ 5, then there exists Y* e 



such that Y* = limfegz^ 6'fc_i(Zfc + S'fe - _D) = 
S* — £>). Taking the limit of (5. 12), (5. 13) along JC C Z+ and using the fact that VmikeK Pk{Zk+i 
we have 



(5.12) 
(5.13) 

9*{X* + 
Zk) = 0, 



e d\\x*\l - 
Oe^a||^*||i 



'*iX* 



S* -D), 
-S* -D) 



(5.14) 
(5.15) 



( [57L4| ) and ( [57L5| together imply that {X*,S*), Y* = 6'*(X' 

tions for the problem minx, z,s{||-'^||* +C ll'5'lli : ^\\Z + S — 
is a saddle point of the Lagrangian function 



5** — D) and 9* satisfy KKT optimality condi- 
^, X = Z}. Hence, {X* , X* , S* ,Y* ,9*) 



^111 < 



C{X,Z,S;Y,9)^\\X\U+^ \\Sh + {Y, X - Z) + - {\\Z + S - 



DWl 



'fe < Smce 



lim 



Suppose that \\D-X*\\f = 6. Fix fc > 0. If \\D-{Xk+i + -^ Yk)\\F < S, then 9k = 0. Otherwise, 9k > 

and as shown in case 2 in the first part of the proof 9k < ^v"™" _ Thus, for any fc > 0, < 
{9k}k£i+ is a bounded sequence, there exists a further subsequence ICg C JC such that 
and y* := limfcgK;^ 6'fe-i(^fc + 5^ - D) = 6I*(X* + S* - D) exist. Thus, taking the limit of ( |5.12p ,( |5.13[ ) 
along JCg C Z+ and using the facts that limkeK Pk{Zk+i 
S* = limfcgz_|_ Sk exist, we conclude that {X*,X*,S*,Y* 
C{X,Z,S;Y,9). □ 
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Zk) 



and X* 



lim 



ke2 



Xk = limi, 



fee? 



Zh 



is a saddle point of the Lagrangian function 



6. Numerical experiments. Our preliminary numerical experiments showed that among the four 
algorithms discussed in this paper, NSA is the fastest. It also has very few parameters that need to be 
tuned. Therefore, we only report the results for NSA. We conducted two sets of numerical experiments with 

In the first set we solved randomly generated instances of the 

In this setting, first we tested only NSA to see how the run 



NSA to solve (1.4|, where ^ 



-y/ inax{m,n} 

stable principle component pursuit problem, 
times scale with respect to problem parameters and size; then we compared NSA with another alternating 
direction augmented Lagrangian algorithm ASALM [TU] . In the second set of experiments, we ran NSA and 
ASALM to extract moving objects from an airport security noisy video [S]. 

6.1. Random Stable Principle Component Pursuit Problems. We tested NSA on randomly 
generated stable principle component pursuit problems. The data matrices for these problems, D — X'^ + 
5"° + 1 were generated as follows 



i. A'J = UV , such that U G 



V e 



for r = and Uij ^ A/'(0, 1), Vij ^ A/'(0, 1) for all i,j are 



n. 
iii. 
iv. 



independent standard Gaussian variables and G {0.05,0.1}, 

A C {(i, j) : 1 < i,j < n} such that cardinality of A, |A| — p ioi p 



and Cp e {0.05,0.1}, 



S^j ^ U[—100, 100] for all G A are independent uniform random variables between 
(ij ~ eA/'(0, 1) for all i,j are independent Gaussian variables. 



400 and 100, 



We created 10 random problems of size n G {500, 1000, 1500}, i.e. D e M"^", for each of the two choices of 
Cr and Cp using the procedure described above, where g was set such that signal-to-noise ratio of D is either 
80dB or 4:5dB. Signal-to-noise ratio of D is given by 



SNR(D) = 10 log 



'e[||X" + SO| 



10 



E[||C°II|] 



10 log 



10 



Ci-n 



jooVs 



(6.1) 



Hence, for a given SNR value, we selected g according to (6.1). Table 6.1 displays the g value we have used 
in our experiments. As in [lOj , we set S — y {n + \/8n)g in (1.4 1 in the first set of experiments for both 



Table 6.1 

Q values depending on the experimental setting 



SNR 


n 


Cr=0.05 Cp=0.05 


Cr=0.05 Cp=0.1 


Cr = 0.1 Cp=0.05 


Cr=0.1 Cp=0.1 


80dB 


500 


0.0014 


0.0019 


0.0015 


0.0020 


1000 


0.0015 


0.0020 


0.0016 


0.0021 


1500 


0.0016 


0.0020 


0.0018 


0.0022 


ibdB 


500 


0.0779 


0.1064 


0.0828 


0.1101 


1000 


0.0828 


0.1101 


0.0918 


0.1171 


1500 


0.0874 


0.1136 


0.1001 


0.1236 



NSA and ASALM. 

Our code for NSA was written in MATLAB 7.2 and can be found at http : / /www . Columbia . edu/ 
|-nsa2106 We terminated the algorithm when 



||(Afe+i, Sk+i) — (Afc, 5fc)| 



\{Xk.Sk)\\ 



1 



(6.2) 



The results of our experiments are displayed in Tables |6.2| and |6.3[ In Table |6.2[ the row labeled CPU lists 
the running time of NSA in seconds and the row labeled SVD^ lists the number of partial singular value 
decomposition (SVD) computed by NSA. The minimum, average and maximum CPU times and number of 
partial SVD taken over the 10 random instances are given for each choice of n, Cr and Cp values. Tal 
and Table [C4| in the appendix list additional error statistics. 



With the stopping condition given in (6.2 1, the solutions produced by NSA have 

-4 



ap- 



proximately 1.5 X 10"* when SNR(i:i) = 80dB and 5 x 10"^ ^^len SnK{D) = 45dB, regardless of the 
problem dimension n and the problem parameters related to the rank and sparsity of D, i.e. Cr and Cp. After 
thresholding the singular values of X*"' that were less than 1 x 10""'^^, NSA found the true rank in all 120 

13 



random problems solved when SNR(r') — 80dB, and it found the true rank for 113 out of 120 problems w hen 
SNR(Z)) = 4:5dB, while for 6 of the remaining problems rank(X*°') is off from rank(X'') only by 1. Table 6.2 



shows that the number of partial SVD was a very slightly increasing function of n, Cr and Cp. Moreover, 
Table 6.3 shows that the relative error of the solution (X""', 5'^°') was almost constant for different n, and 



Cp values. 



Table 6.2 

NSA: Solution time for decomposing D e R"^", n e {500, 1000, 1500} 





Cr=0.05 Cp=0.05 


Cr = 0.05 Cp=0.1 


Cr=0.1 Cp=0.05 


Cr=0.1 Cp=0.1 


SNR 


n 


Field 


min/avg/max 


min/avg/max 


min/avg/max 


min/ avg / max 


80dB 


500 


SVD# 
CPU 


9/9.0/9 
3.2/4.4/5.1 


9/9.5/10 
3.6/5.1/6.6 


lO/lO.O/lO 
4.3/5.2/6.4 


11/11/11 
5.0/6.2/8.1 


1000 


SVD# 
CPU 


9/9.9/10 
16.5/19.6/22.4 


10/10.0/10 
14.6/20.7/24.3 


11/11/11 
25.2/26.9/29.1 


12/12.0/12 
27.9/31.2/36.3 


1500 


SVD# 
CPU 


10/10.0/10 
38.6/44.1/46.6 


10/10.9/11 
43.7/48.6/51.9 


12/12.0/12 
78.6/84.1/90.8 


12/12.2/13 
80.7/97.7/155.2 


i5dB 


500 


SVD# 
CPU 


6/6/6 
2.3/2.9/4.2 


6/6.9/7 
2.9/3.6/4.5 


7/7.1/8 
2.9/3.9/6.2 


8/8/8 
3.5/4.2/6.0 


1000 


SVD# 
CPU 


7/7.0/7 
11.5/13.4/17.4 


7/7.0/7 
10.6/13.3/17.9 


8/8.1/9 
17.1/18.7/20.7 


9/9.0/9 
19.7/23.8/28.9 


1500 


SVD# 
CPU 


7/7.9/8 
34.1/37.7/44.0 


8/8.0/8 
30.7/37.1/45.6 


9/9.0/9 
55.6/59.0/63.7 


9/9.0/9 
55.9/59.7/64.8 



Table 6.3 

NSA: Solution accuracy for decomposing D e K"^", n e {500, 1000, 1500} 





Cr=0.05 Cp=0.05 


Cr=0.05 Cp=0.1 


Cr=0.1 Cp=0.05 


Cr=0.1 Cp = 0.1 


SNR 


n 


Relative Error 


avg / max 


avg / max 


avg / max 


avg / max 


SOdB 


500 


|x»°'-x" 


If 


4.0E-4 / 4.2E-4 
1.7E-4 / 1.8E-4 


5.8E-4 / 8.5E-4 
1.6E-4 / 2.5E-4 


3.6E-4 / 3.9E-4 
1.6E-4 / 1.8E-4 


4.4E-4 / 4.5E-4 
1.3E-4 / 1.3E-4 


iix"iif 

||gSol_gO 


f 


IIs^IIf, 




1000 


|X==°'-X" 


If 


2.0E-4 / 2.4E-4 
1.2E-4 / 1.4E-4 


3.8E-4 / 4.1E-4 
1.5E-4 / 1.6E-4 


2.2E-4 / 2.2E-4 
1.2E-4 / 1.3E-4 


2.8E-4 / 2.9E-4 
l.lE-4 / l.lE-4 


I|X"||f 

||gsol_g0 


f 


I|s"||f 




1500 


|X=°'-X'' 


F 


1.8E-4 / 2.2E-4 
1.3E-4 / 1.6E-4 


2.1E-4 / 2.6E-4 
9.6E-5 / l.lE-4 


1.3E-4 / 1.3E-4 
8.1E-5 / 8.5E-5 


2.8E-4 / 2.9E-4 
1.3E-4 / 1.4E-4 


I|xO||f 

||gsol_g0 


F 


I|S"||f 




45dB 


500 


x=°'-x" 


F 


6.0E-3 / 6.2E-3 
2.1E-3 / 2.2E-3 


8.0E-3 / 9.2E-3 
2.3E-3 / 2.7E-3 


6.1E-3 / 6.3E-3 
2.2E-3 / 2.3E-3 


8.1E-3 / 8.2E-3 
2.7E-3 / 2.9E-3 


I|X"||f 

||gsol_gO 


F 


I|s"||f 




1000 


|x»°'-x'' 


If 


4.1E-3 / 4.2E-3 
1.9E-3 / 1.9E-3 


6.1E-3 / 6.2E-3 
2.4E-3 / 2.5E-3 


4.6E-3 / 4.7E-3 
2.3E-3 / 3.5E-3 


6.0E-3 / 6.5E-3 
3.1E-3 / 3.7E-3 


I|xO||f 

||gSol_gO 


F 


I|s"||f 




1500 


|X=°I-X" 


F 


3.4E-3 / 3.6E-3 
1.8E-3 / 1.8E-3 


4.7E-3 / 4.7E-3 
2.3E-3 / 2.3E-3 


3.9E-3 / 4.0E-3 
2.6E-3 / 3.5E-3 


5.3E-3 / 5.3E-3 
3.1E-3 / 3.1E-3 


I|X"||f 

||gsol_g0 


F 


I|sO||f 



Next, we compared NSA with ASALM [10] for a fixed problem size, i.e. n — 1500 where D e 



In 

all the numerical experiments, we terminated NSA according to (6.2). For random problems with SNR(_D) = 
80dB, we terminated ASALM according to (6.2). However, for random problems with SNR(£') = 45dB, 



ASALM produced solutions with 99% relative errors when (6.2) was used. Therefore, for random problems 
with SNR(_D) = 4:5dB, we terminated ASALM either when it computed a solution with better relative 



errors comparing to NSA solution for the same problem or when an iterate satisfied (6.2 ) with the righthand 
side replaced by 0.1 g. The code for ASALM was obtained from the authors of [10]. 
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The comparison results are displayed in Table 6.5 and Table 6.6 In Table 6.5 the row labeled CPU 



lists the running time of each algorithm in seconds and the row labeled SVD# lists the number of partial 
SVD computation of each algorithm. In Table [675} the minimum, average and maximum of CPU times and 
the number of partial SVD computation of each algorithm taken over the 10 random instances are given for 
each two choices of Cr and Cp. Moreover, Table C.l and Table C.2 given in the appendix list different error 
statistics. 

We used PROPACK [J] for computing partial singular value decompositions. In order to estimate the 
rank of X^, we followed the scheme proposed in Equation (17) in [5^. 

Both NSA and ASALM found the true rank in all 40 random problems solved when SNR(L») = 80dB. 
NSA found the true rank for 39 out of 40 problems with n = 1500 when SNR(£') — ABdB, while for the 
remaining 1 problem rank(Ar'*°') is off from rank(Ar°) only by 1. On the other hand, when SNR(_D) = 45dB, 
ASALM could not find the true rank in any of th e tes t problems. For each of the four problem settings 
corresponding to different Cr and Cp values, in Table 6.4 we report the average and maximum of rank(X*°') 
over 10 random instances, after thresholding the singular values of AT''"' that were less than 1 x 10^^^. 



Table 6.5 shows that for all of the problem classes, the number of partial SVD required by ASALM was 



Table 6.4 

NSA vs ASALM: rank(X*°') values for problems with n = 1500, SNR(D) = 45dB 





rank(X") = 75 


rank(X" 


) = 150 


Cr=0.05 Cp=0.05 


Cr=0.05 Cp=0.1 


Cr=0.1 Cp=0.05 


Cr=0.1 Cp=0.1 


Alg. 


avg / max 


avg / max 


avg / max 


avg / max 


NSA 


75 / 75 


75 / 75 


150.1 / 151 


150 / 150 


ASALM 


175.8 / 177 


179 / 207 


222.4 / 224 


201.9 / 204 



more than twice the number that NSA required. On the other hand, there was a big difference in CPU 
times; this difference can be explained by the fact that ASALM required more leading singular values than 
NSA did per partial SVD computation. Table |6.6| shows that although the relative errors of the low-rank 
components produced by NSA were slightly better, the relative errors of the sparse components produced 
by NSA were significantly better than those produced by ASALM. Finally, in Figure |6.1[ we plot the 
decomposition o{D = X" + S° + C" G M"><" generated by NSA, where rank(X") = 75, ||5°||o = 112, 500 and 
SNR(Z)) = 45. In the first row, we plot randomly selected 1500 components of 5"° and 100 leading singular 
values of X° in the first row. In the second row, we plot the same components of 5*"°' and 100 singular of 
X""^ produced by NSA. In the third row, we plot the absolute errors of S"""' and X'"''. Note that the scales 
of the graphs showing absolute errors of 5'''°' and X''°^ are larger than those of and X^. And in the 
fourth row, we plot the same 1500 random components of C*'. When we compare the absolute error graphs 
of S^°^ and A"""' with the graph showing C", we can confirm that the solution produced by NSA is inline 
with Theorem O 



Table 6.5 

NSA vs ASALM: Solution time for decomposing D e R"><", n = 1500 





Cr=0.05 Cp=0.05 


Cr = 0.05 Cp=0.1 


Cr=0.1 Cp=0.05 


Cr=0.1 Cp = 0.1 


SNR 


Alg. 


Field 


min/avg/max 


min / avg /max 


min/avg/max 


min/avg/max 


80dB 


NSA 


SVD# 
CPU 


10/10.0/10 
38.6/44.1/46.6 


10/10.9/11 
43.7/48.6/51.9 


12/12.0/12 
78.6/84.1/90.8 


12/12.2/13 
80.7/97.7/155.2 


ASALM 


SVD# 
CPU 


22/22.0/22 
657.3/677.8/736.2 


20/20.0/20 
809.7/850.0/874.7 


29/29.0/29 
1277.3/1316.1/1368.6 


29/29.4/30 
1833.2/1905.2/2004.7 


AbdB 


NSA 


SVD# 
CPU 


7/7.9/8 
34.1/37.7/44.0 


8/8.0/8 
30.7/37.1/45.6 


9/9.0/9 
55.6/59.0/63.7 


9/9.0/9 
55.9/59.7/64.8 


ASALM 


SVD# 
CPU 


21/21/21 
666.6/686.9/708.9 


18/18.5/19 
835.7/857.1/887.2 


28/28.0/28 
1201.9/1223.2/1277.5 


27/27.3/28 
1677.1/1739.1/1846.5 
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Table 6.6 

NSA vs ASALM: Solution accuracy for decomposing D e R"^", n = 1500 





Cr=0.05 Cp=0.05 


Cr=0.05 Cp=0.1 


Cr=0.1 Cp=0.05 


Cr=0.1 Cp=0.1 


SNR 


Alg. 


Relative Error 


avg / max 


avg / max 


avg / max 


avg / max 


80dB 


NSA 


||X»°'-X"||f 
l|X"||p 

||S=°'-S°||f 
I|sO||f 


1.8E-4 / 2.2E-4 
1.3il/-4 / l.Dhj-4 


2.1E-4 / 2.6E-4 
9.61li-5 / l.lij-4 


1.3E-4 / 1.3E-4 
8.1E-5 / 8.5hj-5 


2.8E-4 / 2.9E-4 
1.3Jli-4 / 1.4b-4 


ASALM 


||X=°'-X"[|f 
IIX^IIp 

[|S»°'-S°||p 
I|S"||f 


3.9E-4 / 4.2E-4 
5.7E-4 / 6.2E-4 


8.4E-4 / 8.8E-4 
7.6E-4 / 8.0E-4 


6.6E-4 / 6.8E-4 
l.lE-3 / l.lE-3 


1.4E-3 / 1.4E-3 
1.4E-3 / 1.4E-3 


45dB 


NSA 


||X»°'-X"[|f 
IIX^IIp 

||S=°'-S°||f 
llfllF, 


3.4E-3 / 3.6E-3 
1.8E-3 / 1.8E-3 


4.7E-3 / 4.7E-3 
2.3E-3 / 2.3E-3 


3.9E-3 / 4.0E-3 
2.6E-3 / 3.5E-3 


5.3E-3 / 5.3E-3 
3.1E-3 / 3.1E-3 


ASALM 


||X='°'-X"||f 

[|X"||p 
||S=°'-S''||p 

I1s"I1p 


4.6E-3 / 4.8E-3 
4.8E-3 / 4.9E-3 


7.3E-3 / 8.4E-3 
5.8E-3 / 7.0E-3 


4.7E-3 / 4.7E-3 
5.5E-3 / 5.5E-3 


7.8E-3 / 7.9E-3 
7.3E-3 / 7.5E-3 



Fig. 6.1. NSA: Comparison of randomly selected 1500 components of with absolute errors of those components in 3"°' 
and a{X''°'). D 6 K">^", n = 1500, SNR(D) = A5dB 




i: components 



6.2. Foreground Detection on a Noisy Video. We used NSA and ASALM to extract moving 
objects in an airport security video |5], which is a sequence of 201 grayscale frames of size 144 x 176. 
We assume that the airport security video [S] was not corrupted by Gaussian noise. We formed the i-th 
column of the data matrix D by stacking the columns of the i*'' frame into a long vector, i.e. D is in 
]]j25344x20i^ In Order to have a noisy video with SNR = 20dB signal-to-noise ratio (SNR), given D, we chose 
g = ||L»||f/(V144 X 176 x 201 10SW2O) ^nd then obtained a noisy D by D = D + g randn{UA * 176, 201), 
where randn{m, n) produces a random matrix with independent standard Gaussian entries. Solving for 
{X* , S*) = argmin^ 5gjj25344x2oi{||Ar||* -I- : \\X + S — D\\p < 5}, we decompose D into a low rank 
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matrix X* and a sparse matrix S* . We estimate the i-th frame background image with the z-th column of 
X* and estimate the «-th frame moving object with the i-th column of S* . Both algorithms arc terminated 
when Ux>'+^i;'l''+i}-ix,,s,)\\j. ^ ^ ^ -^Q_4_ 

{X'°\ 8"°^) denote the variables 



I (^fc + i •Sk-\-i) — {Xk ,Sk)\\ 

The recovery statistics of each algorithm are are displayed in Table 



6.7 



corresponding to the low-rank and sparse components of D, respectively, when the algorithm of interest 
terminates. Figure [7] and Figure [7] show the 35-th, 100-th and 125-th frames of the noise added airport 
security video [5J in their first row of images. The second and third rows in these tables have the recovered 
background and foreground images of the selected frames, respectively. Even though the visual quality of 
recovered background and foreground are very similar, Table |6.7| shows that both the number of partial 
SVDs and the CPU time of NSA are significantly less than those for ASALM. 

Table 6.7 

NSA vs ASALM: Recovery statistics for foreground detection on a noisy video 



Alg. 


CPU 


SVD# 


1|X=°'|1* 


||S=°'||i 


rank{X=°') 


1 j.S0l^gS0l_j-,||^ 

1|d[|f 


NSA 


160.8 


19 


398662.9 


76221854.1 


81 


0.00068 


ASALM 


910.0 


94 


401863.6 


75751977.1 


89 


0.00080 



7. Acknowledgements. We would like to thank to Min Tao for providing the code ASALM. 




Fig. 7.1. Background extraction from a video with 20dB SNR using NSA 
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Fig. 7.2. Background extraction from a video with 20dB SNR using ASALM 
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he closed convex functions and define 



Q*{Z, S\X) i:{Z, S) + ^{X) + {j^{X), Z - X) + ^\\Z 
Q^iZ\X, S) (biZ) + S) + {jtiX, S),Z-X) + P-\\Z 



-x\ 



(A.l) 
(A.2) 



{pt{X),pt{X)) := argmin Q'>'{Z,S\X), 
'{X, S) := argmin Q'''{Z\X, S), 



P 



(A.3) 
(A.4) 



where "f'^{X) is any suhgradient in the subdifferential d(p at the point X and y-)f{X,S),"f^[X,S)^ is any 

suhgradient in the subdifferential d^J at the point {X, S) . 

Lemma A. 2. Let (j>, iP, Q'*', Q"^ , pi, pf, p'^, 7*, -^f , jf be as given m Definition A.l. and ^{X,S) := 
(j){X) + i;{X,S). Le<X"eM"^" and define X -.^ pi{X") and S -.^ pt{X°). If 



<^{X,S) < Q'f'{X,S\X° 



then for any {X, S") G M"''" x 

2 
P 

Moreover, if 



$(X,S')-$(x,5)) > \\X ^ X\\% - \\X - X°\ 
$ (p^ (1, ^) , ^) < (p^ (a, s) \x,s), 



then for any (A, S") e M"^" x E"><", 

-U{x,s)~'P{p^ [x,s),syj > \\x-p^ (1,5) \\l-\\x-x\\ 



2 

F- 



(A.5) 



(A.6) 



(A.7) 



(A.8) 
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Proof. Let X° € M"^" satisfy ( |X5| . Then for any (X, 5) e M'"^" x M™^", we have 



(A.9) 



First order optimahty conditions for (A.3) and ip being a closed convex function guarantee that there exists 
i-ft(^X,Sy-ft{x,S^] edip(^X,S^ such that 



{X, S) + 7^^°) + p[x - X") = 0, 

jf[x,s) 



0, 



(A.IO) 
(A.ll) 



where d'ip(^X, denotes the subdifferential of ?/;(.,.) at the point (^X, S 
Moreover, using the convexity of ^p{., .) and 0(.), we have 

^{x, s) > v^(x, s) + (7I [x, s),x-x) + (^^f (x, s),s-s), 

cl,{X)>cf,{X") + {j^{X'),X~X'). 
These two inequahties and ( |A.ll ) together imply 

s) > 7/^(1, + (7^ (1, s),x-x) + + (jHx°),x - 



(A.12) 



This inequality together with (A. 5 1 and (A.IO I gives 



S) - $(x,S') 

>(^t{x,s),X- X) + (^j^{X°),X- - (7^^"), A - A") - ^11 A - 
= (jHx") + it (a, 5) , A - a) - ^11 A - XYf, 



=p(A°-A, A-a)-^||A-A°||^, 



= ^ IIA-AI 



lA- A" 



Hence, we have (A.6I. Suppose that A° satisfies (A. 7). Then for any {X,S) S 



pmxn w m>mxn 



, we have 



$(A, S") - ^(p'^ (a, S*) , S*) > $(A, S*) - Q"^ (p'^ (a, S*) I 1, 



(A.13) 



First order optimality conditions for (A.4) and (p being a closed convex function guarantee that there exists 
7'^(p'^ (a, S*) ) e d(l>(^p'^' (a, S") ) such that 

7* (p^ (a, ^) ) + 7^ (a, ^) + p{p^ (a, ^) - a) = 

Moreover, using the convexity of (/>(.) and ip{., .), we have 

0(A) > (a, ^) ) + (7*(p'/' (a, ^) ) , A - (a, S 

ijiX,S)>yj[x,s)+(-/t(x,s) 



A- A 



(A.14) 

(A.15) 
(A.16) 
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where (AJE^ follows from the fact that {x,S^ = argmin^,^ .5 Q'^(X, S"] X") implies (^-ff {x,S^ , G 
dip (^X, , i.e. we can set jf (^X, 5^ 



0. Summing the two inequalities (A. 15 1 and (A. 16 1 give 

,x -x"" 



^X, 5) > V {X, S) + {^t (x,s),X-x)+<ly (f (x, s) ) + (7^ (p'^' (x , s)) , X ~ (x , s)) . 



(A.17) 



This inequality together with (A. 7 1 and (A. 14 1 gives 



<^{X,S)-'^{p'^ [x,s),s) 
>{^^t{X,S),X-x) + (j'f' (p^' (1, 5) ) , X - (x, s) ) 



7,^(1,5), p^[x,s)-x) - ^\\p^'[x,s)~x\\ 



7* (p^ {X, S) ) + {X, s),X-p^ [X, s))-^\\p^ [X, S) X\\f, 

--P (x - p^ (1, s), X-p^ (1, s) ) - ^\\p^' [x, s) - x\\%, 

_p 
'2 



\x^p*(x,s) \\l-\\x-x\\ 



Hence, we have ( |A.8[ ). □ 

We are now ready to give the proof of Theorem |4.1| 

Proof. Let / := {0 < i < fc - 1 : ^{X,+i,Si) < Cp{X,+i, Zi, S.r,Yi)} and r {0, 1, fc - 1} \ /. Since 
V(/)(.) is Lipschitz continuous with Lipschitz constant L and p > L, ^{pi{X),p'l{X)) < Q'*'{pi{X),p'l{X)\ X) 
is true for all X e Since ( |Al5| in Lemma [O] is true for all AT" e E™><", ( [Xe] ) is true for all 



(a:, S') e M"''" X K™^". Particularly, since for all i ^ I \J T 



= argming^(Z,5| 
z,s 



(A.18) 



setting {X,S) := {X*,S*) and X° := AT^+i in Lemma [AJ imply that pt{X,+i) = Z,+i, pt{X,+i) = S,+i 
and we have 



mx\s*)-^{z.,+,,s.,+,)) > \\z,+,-x*\\l-\\x,+,-x*\\] 



(A.19) 



Moreover, ( |A.18| ) implies that for alH e / U there exits {-/^{Zi,S,),jf{Z„ Si)) € ^VX^i, -^i) such that 

7^(Z„ 5,) + VHX^) + p{Z, - X,) = 0, (A.20) 
jf{Z,,S,)^0. (A.21) 
(A.20 1 and the definition of K^+i of Algorithm ALM-S shown in Algorithm [s] imply that 

Jt{Z^, S,) = -Vcf^iX,) + p{X, - Zi) = Y,. 
Hence, by defining Q'^(.| Zi, Si) according to ( |Al2| using -it{Z,, S^) = F^, for all X e M™^" we have 

/:p(x,z„5,;r,) = 4>{x) + i>{z,,s,) + - z,> + ^\\x- z,\\l - q'/'(x| z,,^,)- (A.22) 

for all i e iLil". Hence, for aU i G / Xi +i = argmiujsf £p(X, Zj, S'i; y^) = argmin^ (5'^(A|Zj, S'i). Thus, 
for all i e I, setting AT" X, in Lemma [X2I imply pt{X.,) = Z,, (AT^) = 5, and p^ {pt{Xi),pt{Xi)) = 
p^(Z„5,) = X,+i. For ah i e / we have $(X,+i,5,) < /:p(X,+i, Z„ 5,; F,) = g'/'(X,+i|Z„ 5,)- Hence, for 
alH e / setting X° := X, in Lemma [A^ satisfies (|Aj]). Therefore, setting (X, S) (X*, S"*) and X° := X, 
in Lemma |A.2| implies that 

-($(X*,5*)-$(X,+i,5,)) > ||X,+i-X*|||,- ||Z,-X*|||,. (A.23) 
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For any i E I, summing (A. 19 1 and (A. 23 1 gives 
2 



(2$(X*,5*) - - > -X*\\j,- \\Z, - X*rp. 



Moreover, since X^+i ~ Zi for i <E and (A. 19) holds for all i e / U we trivially have 

^ {'i>{x*,s*) - a>(z,+i,5,+i)) > - - ||z, - 



(A.24) 



(A.25) 



Summing (A.24 1 and (A.25 1 over i = 0, 1, fc — 1 gives 
r, / fe-i 



(2|/| + \n) HX\S*) - - E > \\Zk - X*\\l - llZo - X*\\l. (A.26) 



For any i € / U 7'^, setting (A", 5*) := (ATi+i, S*;) and AT^ := A^+i in Lemma 



A.2 



gives 



Trivially, for i = 1, we also have 

2 



(A.27) 



(A.28) 



Moreover, since for all i € I setting X^ := Xi in Lemma A.2 satisfies (A.7I, setting {X^S) := {Zi,Si) and 
Ar° := AT," in Lemma 



A.2 



implies that 
2 



($(z„s'0 - > ll^.+i - ^^llf > 0. 



(A.29) 



And since X^+i = Zi for all i e J'^, (A.29) trivially holds for all i G I''. Thus, for all z e / U we have 

- 5,) - $(X,+i, 5,)) > 0. (A.30) 



Adding ( |A.27P and ( |A.30| yields <I>(Z,, S*,) > for alH G / U /'^ and adding ( |A.28| ) and ( |A.30[ ) 

yields $(Xj, S'j.i) > $(^^+1, 5*4) for all i = 1, fc - 1. Hence, 



fc-i 



> fc$(Zfc,^fc), and > nfc$(Xfc, 



1=0 



iei 



These two inequalities, (A.26) and the fact that Xq = Z^ imply 

2 
P 



((2|/| + |r|) <i>{X*,S*) - rik^Xk, Sk-i) ~ k<i>{Zk,Sk)) > -II Ao - X*\\l. 



(A.31) 



(A.32) 



Hence, K4| follows from the facts: 2|/| + 1/^=1 = k + Uk and nk^{Xk, Sk-i) + k^{Zk, Sk) > {k + nk)^{Zk, Sk) 
due to ( A.27|). □ 



Appendix B. Proof of Lemma |5.1[ 

Proof. Since Y* and 9* are optimal Lagrangian dual variables, we have 

{X*,X*,S*) = argmin ||X||, + ^ \\S\\i + {Y* , X - Z) + ^ {\\Z + S - D\\l - 
x,z,s 2 
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Then from first-order optimality conditions, we have 



e d\\X*\\^ + Y*, 

G ^ d\\S*\\i+e*{X* + S* - D), 

-Y* +e*{x* +S* - D) = 0. 

Hence, -Y* e d\\X*\\^ and -Y* e ^ 

For k >0, since Xk+i is the optimal solution for the fc-th subproblem given in Step |4] in Algorithm [s] 
from the first-order optimality conditions it follows that 

e d\\Xk+i\U + Yk + PkiXk+i - Zk). (B.i) 

For fc > 0, let 0^ > Qhe the optimal Lagrange multiplier for the quadratic constraint in the fc-th subproblem 
given in Step [6] in Algorithm [sj Since {Sk+i, Zk+i) is the optimal solution, from the first-order optimality 
conditions it follows that 

G m\Sk+i\\i + 6k{Zk+i + - B). (B.2) 
-Yk + Pfc(Zfe+i - Xfe+i) + 6k{Zk+x + Sk+i -B)=Q. (B.3) 



From (B.ll, it follows that — Yfe+i G Hence, {Yk}kez+ is a bounded sequence. From (B.2) and 

(B.3), it follows that —Yk+i G ^ Hence, {Yk}kez,+ is also a bounded sequence. 

Furthermore, since Yfe+i - Yfe = pk{Xk+i - Zk+i) and Yk+i ~ Yk+i = Pk{Zk - Zk+i), we have 





^i-Yk,Yk- 


n-Y*) 










= {Xk+i - 


^ Zk+i, Yk^ 


i-Y*), 










= {Xk+i - 


-X*,Yk+i 


-Y*) + 


{X* - Zk, 


-1, Yk+i — Y*), 






= {Xk+i - 


- X* ,Yk+i 


- Yk+i) 


+ {Xk+1 — 


X* ,Yk+i — Y*) - 


f {X* — Zk+i,Yk+i 


-Y*), 


=Pk{Xk+ 


1 — X* ,Zk 


- Zk+i) 


+ {Xk+1 - 


-X*,Yk+i-Y*) 


+ {X* - Zk+uYk+i 


-Y*) 



Using the above equality, for all /c > f , we trivially have 

\\Zk+i-X*\\l + pl^\\Yk+i-Y*\\l 
^\\Zk - X*fp + Pfe'llFfe -Y*fp- \\Zk+i -ZkWl- Pk^\\Yk+i - YkWl, 

+ 2{Zk+i - X\Zk+i - Zk) + 2p-^^{Yk+i - Yk,Yk+i - Y*), 
= \\Zk - X*\\l + pl^WYk -Y*\\l- \\Zk+i -ZkWl- p-k^\\Yk+i - mil, 

+ 2(Zfe+i — X* , Zk+i — Zk) + 2{Xk+i — X* , Zk — Zk+i) 

- Ipl^ ((-ffe+i + Y\Xk+x - X*) + {-Yk+i +Y*,X*- Zk+i) 
= \\Zk - X*\\l + pl^WYk -Y*\\l- \\Zk+i - ZkWl - p-k^\\Yk+i - Ykfp, 

+ 2{Zk+i - Xk+u Zk+i - Zk) - 2p^^ [{-Yk+i + Y*,Xk+i - X*) + {-Yk+i +Y*,X*- Zk+i) 
= \\Zk - X*\\l + p^^WYk - - \\Zk+i - ZkWl - p-k^\\Yk+i - YkWl, 

- 2p-i ({Yk+i - Yk, Zk+i - Zk) + {-Yk+i + Y\Xk+i - X*) + {-Yk+i +Y*,X* - Zk+i)) (B.4) 



Since -Yk G ^ d\\Sk\\i for all A: > 1 and -Y* G C 9||5*||i, we have for all fc > 1 



{-Yk+i+Yk,Sk+i- Sk)>0, (B.5) 
{-Yk+i+Y*,Sk+i-S*)>0. (B.6) 
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Since pk+i > Pk for all fc > 1, adding (B.5I, (B.6) to (B.4| and subtracting (B.6) from (B.4), we have 



* l|2 



\\Zk+i - x*\\p + Pj,_|_ii|yfe+i - Y 

< \\Zk+i-X*\\l + p-^Yk+i-Y*\\l 

< \\Zk-X*\\l + p-^Yk - Y*\\l - \\Zk+^ - ZkWl - PZ^Wk+i - YkWl 
- 2pfe 1 {{-Yk+i + Y\Xk+i - X*) + {-Yk+i + Y*,Sk+i - S* 



- 2pl^ {{Yk+i - Yk, Zk+i + Sk+i -Zk- Sk) + {-Yk+i +Y\X* +S* - Zk+i - Sk+i)) (B.7) 
Applying Lemma |3.2| on the k-ih. subproblem given in Step [6] in Algorithm [5] it follows that 

Using arguments similar to those used in the proof of Lemma |3.2[ one can also show that 

{Y\Y*) e dl^{X\S*). 

Moreover, since -Yfc e ^ d\\Sk\\i, G ^ d\\Xk\U for aU fc > 1, -Y* e ^ d\\S*\\i and -Y* e d\\X*\\^, we 
have 

{Yk+i - Yk, Zk+i + Sk+i -Zk- Sk) > 0, 
{-Yk+i +Y*,X*+S*- Zk+i - Sk+i) > 0, 
{-Yk+i + Y*,Sk+i-S*)>0, 
{-Yk+,+Y*,Xk+i-X*) >0, 



for all fc > 1. Therefore, the above inequalities and (B.7) together imply that {\\Zk — Ar*|||, + Pk'^\\Yk — 
Y*\\%}k£Z+ is a non-increasing sequence. Moreover, we also have 



J2 \\Zk+i-Zk\\%+p^^Yk+,-Yk\\l 
+2 pl^({-Yk+i+Y\Xk+i-X*) + {-Yk+i+Y\Sk+i-S* 
+2 ^ Pj, ^ ((^fc+i — Yk, Zk+i + Sk+i — Zk — Sk) + {—Yk+i + Y* ,X* + S* — Zk+i — Sk+i)) 



k£l 



= ^ {\\Zk-X*\\l+p-^\\Yk-Y*\\l-\\Zk+i-X*\\l-p-l^\\Yk+i-Y*\\l)<^ 
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Appendix C. Additional Statistics for Numerical Experiments. 



Table C.l 

NSA vs ASALM: Additional statistics on solution accuracy for decomposing D g R"^", n = 1500, SNR(D) = 80dB 





Cr=0.05 Cp=0.05 


Cr=0.05 Cp = 0.1 


NSA 


ASALM 


NSA 


ASALM 


Error Type 


avg / max 


avg / max 


avg / max 


avg / max 


1 ||X->||.-||XO||.|/||XO|U 


1.7E-6 / 5.2E-6 


6.9E-6 / l.OE-5 


5.0E-6 / 3.2E-5 


2.3E-5 / 3.8E-5 


max{|(Ti - (Tpl : o-p > 0} 


4.1E-2 / 5.2E-2 


3.9E-2 / 4.7E-2 


5.2E-2 / 1.8E-1 


l.lE-1 / 1.7E-1 


max{|o-i| : af = 0} 


7.9E-13 / 2.2E-12 


6.3E-13 / 1.6E-12 


8.6E-13 / 2.0E-12 


l.lE-12 / 2.0E-12 


1 ||s-'iii-i|soiui/i|soiu 


l.lE-5 / 1.4E-5 


6.2E-6 / 9.7E-6 


9.7E-6 / 1.5E-5 


8.6E-5 / 9.7E-5 


max{|S?P'-SO|:Sp. ^0} 


2.9E-1 / 3.5E-1 


5.9E-1 / 8.0E-1 


2.2E-1 / 2.4E-1 


5.9E-1 / 7.4E-1 


max{|S?pl|:Spj=0} 


0/0 


4.0E-1 / 7.2E-1 


8.3E-3 / l.lE-2 


1.9E-1 / 5.5E-1 








Cr=0.1 Cp=0.05 


Cr=0.1 Cp=0.1 


NSA 


ASALM 


NSA 


ASALM 


Error Type 


avg / max 


avg / max 


avg / max 


avg / max 


1 ||x=°'||, - iixoii^i/iixoii. 


5.6E-6 / 6.4E-6 


4.6E-5 / 4.9E-5 


6.2E-6 / 7.1E-6 


1.2E-4 / 1.4E-4 


max{|(Ti - o-pi : o-p > 0} 


5.7E-2 / 6.2E-2 


1.2E-1 / 1.3E-1 


8.8E-2 / l.OE-1 


3.0E-1 / 3.7E-1 


max{|(Ti| : (Tp = 0} 


6.9E-13 / 1.5E-12 


6.2E-13 / 9.9E-13 


6.2E-13 / 1.3E-12 


3.9E-13 / l.OE-12 


1 llS-'lli-llSOlUI/llSOlli 


1.2E-5 / 1.6E-5 


1.6E-4 / 1.7E-4 


3.4E-5 / 3.7E-5 


2.5E-4 / 2.7E-4 


maxllS??' - SP.I : SP. 0} 


1.6E-1 / 1.9E-1 


6.7E-1 / 8.3E-1 


1.7E-1 / 2.0E-1 


7.9E-1 / 9.5E-1 


max{|Sf.°'| ; SP. = 0} 


7.0E-3 / l.lE-2 


1.5E-1 / 2.5E-1 


1.3E-2 / 1.9E-2 


1.2E-1 / 2.5E-1 


Table C.2 

NSA vs ASALM: Additional statistics on solution accuracy for decomposing D £ M"^", n = 1500, SNR(£)) = 45dB 




Cr=0.05 Cp=0.05 


Cr=0.05 Cp=0.1 


NSA 


ASALM 


NSA 


ASALM 


Error Type 


avg / max 


avg / max 


avg / max 


avg / max 


1 IIX-ilU-IIXOlUI/IIXOll. 


1.8E-4 / 3.6E-4 


1.4E-3 / 1.5E-3 


2.2E-4 / 2.4E-4 


2.4E-3 / 2.6E-3 


max{\ai -a9\ : ^P > 0} 


5.9E-1 / 1.8E+0 


l.lE+0 / 1.5E+0 


9.8E-1 / l.lE+0 


2.3E+0 / 2.6E+0 


max{|(Ti| : crp = 0} 


6.4E-13 / 1.3E-12 


3.7E+0 / 3.8E+0 


6.1E-13 / l.OE-12 


4.7E+0 / 5.5E+0 


1 ||S^°'||i-||SO||i|/||SO||i 


1.7E-4 / 1.9E-4 


4.2E-3 / 4.3E-3 


1.3E-4 / 1.3E-4 


2.9E-3 / 3.6E-3 


max{|S?P'-SO| :S0 ^0} 


l.OE+0 / 1.2E+0 


3.0E+0 / 3.6E+0 


1.3E+0 / 1.4E+0 


3.2E+0 / 3.8E+0 


max{|S??'| :Sp.=0} 


3.6E-1 / 4.0E-1 


2.2E+0 / 2.6E+0 


5.3E-1 / 6.1E-1 


2.3E+0 / 3.1E+0 








Ci.=0.1 Cp=0.05 


Cr=0.1 Cp=0.1 


NSA 


ASALM 


NSA 


ASALM 


Error Type 


avg / max 


avg / max 


avg / max 


avg / max 


1 !|x^°'||, - iixoii.i/iixoii. 


3.7E-4 / 6.5E-4 


9.7E-5 / 1.3E-4 


6.7E-4 / 6.8E-4 


8.4E-4 / 9.0E-4 


max{|cri - crP| : crP > 0} 


1.3E+0 / 1.5E+0 


1.2E+0 / 1.3E+0 


2.5E+0 / 2.8E+0 


1.3E+0 / 1.5E+0 


max{|o-i| : crP = 0} 


1.6E-1 / 1.6E+0 


3.6E+0 / 3.7E+0 


7.3E-13 / 1.7E-12 


3.2E+0 / 3.3E+0 


1 ||S-'||i-||SO||i|/||SO||i 


8.1E-4 / 3.2E-3 


4.7E-3 / 4.8E-3 


8.9E-4 / 9.0E-4 


4.4E-3 / 4.5E-3 


max{|S?j°'-Spj| :Spj ^0} 


9.3E-1 / l.lE+0 


2.7E+0 / 3.3E+0 


l.lE+0 / 1.2E+0 


3.2E+0 / 3.5E+0 


max{|Sr?'| :S0 =0} 


5.7E-1 / 6.6E-1 


l.lE+0 / 1.4E+0 


7.1E-1 / 7.9E-1 


1.3E+0 / 1.6E+0 
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Table C.3 

NSA: Additional statistics on solution accuracy for decomposing D G R"^", n e {500, 1000, 1500}, SNR(D) = 80dB 





Cr=0.05 Cp=0.05 


Cr=0.05 Cp=0.1 


Cr=0.1 Cp = 0.05 


Cr=0.1 Cp = 0.1 


n 


Error Type 


avg / max 


avg / max 


avg / max 


avg / max 


500 


1 ||x=°'iu-||x"||.| 

l|X"||. 

max{|cri - o-pi : crP > 0} 

max{|cri| : a? = 0} 

1 II 8"°' II 1- lis" II il 
IIS^IIi 

max{|S?i°'-SO| :SP ^0} 

max{jS?j°'| : S9. = 0} 


7.2E-6 / l.lE-5 
1.7E-2 / 2.4E-2 
1.6E-13 / 2.9E-13 
1.6E-5 / 1.7E-5 
3.2E-1 / 4.0E-1 
9.5E-3 / 2.2E-2 


2.0E-5 / 2.7E-5 
3.4E-2 / 5.6E-2 
2.0E-13 / 5.6E-13 
1.5E-5 / 1.8E-5 
3.0E-1 / 4.3E-1 
1.5E-2 / 2.5E-2 


5.6E-6 / 8.2E-6 
2.1E-2 / 2.7E-2 
l.lE-13 / 2.5E-13 
2.9E-5 / 3.2E-5 
2.6E-1 / 3.2E-1 
1.5E-2 / 2.5E-2 


2.1E-5 / 3.1E-5 
3.2E-2 / 3.8E-2 
8.6E-14 / 1.7E-13 
2.6E-5 / 3.0E-5 
1.8E-1 / 2.3E-1 
1.8E-2 / 3.4E-2 


1000 


||x=°'lU-||x"[U| 
l|xO|U 

max{|cri - o-pi : a9 > 0} 

max{|cri| : a? = 0} 
1 I|s=°'||i-||s0||i| 

IIS^IIi 

max{|S?j°'-SPj| :Spj ^0} 
max{|S?j°'| :Spj =0} 


5.6E-6 / 1.7E-5 
1.8E-2 / 4.0E-2 
3.3E-13 / 4.8E-13 
l.lE-5 / 1.5E-5 
2.7E-1 / 3.1E-1 
1.7E-4 / 9.7E-4 


6.2E-6 / 1.7E-5 
3.1E-2 / 4.8E-2 
3.3E-13 / 5.0E-13 
1.7E-5 / 1.9E-5 
3.1E-1 / 3.8E-1 
1.2E-2 / 1.7E-2 


6.9E-6 / 8.6E-6 
5.1E-2 / 6.0E-2 
2.9E-13 / 6.6E-13 
2.8E-5 / 3.0E-5 
2.2E-1 / 2.8E-1 
7.8E-3 / 1.2E-2 


1.5E-6 / 2.6E-6 
5.9E-2 / 6.8E-2 
2.8E-13 / 4.8E-13 
2.9E-5 / 3.0E-5 
1.6E-1 / 1.7E-1 
1.2E-2 / 1.5E-2 


1500 


||x=°'||.-||x"|U| 
l|xO|U 

max{|cri - o-pi : a9 > 0} 

max{|o-i| : a? = 0} 

1 l|S='°'|ll-||S°||i 

lisoili 

max{|Sf?l-SR| :Spj ^0} 
max{|Sf?'| :Spj =0} 


1.7E-6 / 5.2E-6 
4.1E-2 / 5.2E-2 
7.9E-13 / 2.2E-12 
l.lE-5 / 1.4E-5 
2.9E-1 / 3.5E-1 
0/0 


5.0E-6 / 3.2E-5 
5.2E-2 / 1.8E-1 
8.6E-13 / 2.0E-12 
9.7E-6 / 1.5E-5 
2.2E-1 / 2.4E-1 
8.3E-3 / l.lE-2 


5.6E-6 / 6.4E-6 
5.7E-2 / 6.2E-2 
6.9E-13 / 1.5E-12 
1.2E-5 / 1.6E-5 
1.6E-1 / 1.9E-1 
7.0E-3 / l.lE-2 


6.2E-6 / 7.1E-6 
8.8E-2 / l.OE-1 
6.2E-13 / 1.3E-12 
3.4E-5 / 3.7E-5 
1.7E-1 / 2.0E-1 
1.3E-2 / 1.9E-2 



Table C.4 

NSA: Additional statistics on solution accuracy for decomposing D £ M"^", n 6 {500, 1000, 1500}, SNR(D) = 45<iB 





Cr=0.05 Cp=0.05 


Cr=0.05 Cp=0.1 


Cr=0.1 Cp=0.05 


Cr=0.1 Cp=0.1 


n 


Error Type 


avg / max 


avg / max 


avg / max 


avg / max 


500 


1 ||x=°'||.-||x"||.| 
l|x"|U 

maxjlcTi - o-pi : o-p > 0} 

max{|(Ti| : a9 = 0} 
1 II 3"°' II 1- lis" II il 
IIS" 111 

max{|Sf?'-S?.|:Sp. ^0} 

max{|Sf?'| ; S9. = 0} 


6.0E-4 / 9.3E-4 

5.1E-1 / 7.8E-1 
1.7E-13 / 2.7E-13 

3.0E-4 / 3.4E-4 
1.6E+0 / 1.9E+0 

2.3E-1 / 2.9E-1 


5.5E-4 / 6.2E-4 

5.4E-1 / 7.7E-1 
1.6E-13 / 3.0E-13 

2.1E-4 / 2.9E-4 
1.4E+0 / 1.8E+0 

4.0E-1 / 4.7E-1 


7.4E-4 / 8.8E-4 

8.2E-1 / 8.9E-1 
l.OE-13 / 2.1E-13 

3.0E-4 / 1.2E-3 
1.2E+0 / 1.6E+0 

3.5E-1 / 4.6E-1 


l.OE-3 / 1.3E-3 
9.2E-1 / 1.2E+0 

l.lE-1 / 6.0E-1 

6.4E-4 / l.lE-3 
l.OE+0 / 1.3E+0 

5.4E-1 / 6.1E-1 


1000 


1 ||x=°'||,-i|x"|U| 
l|x"|U 

max{|(7i -o-pi : a9 > 0} 

max{|c7i| : ctP = 0} 
1 l|S=°'||i-||S"||i| 
IIS" 111 

max{|S?j°'-SO| :Spj ^0} 
max{|Sf?'| :Spj =0} 


2.8E-4 / 3.1E-4 

5.2E-1 / 6.2E-1 
2.5E-13 / 5.3E-13 

2.2E-4 / 2.3E-4 
1.3E+0 / 1.5E+0 

2.7E-1 / 3.2E-1 


4.4E-4 / 7.5E-4 

8.6E-1 / 1.2E+0 
4.3E-13 / 9.0E-13 

1.4E-4 / 1.7E-4 
1.5E+0 / 1.8E+0 

4.6E-1 / 5.2E-1 


5.6E-4 / 8.0E-4 
1.7E+0 / 1.9E+0 
2.0E-1 / 2.0E+0 

5.5E-4 / 3.7E-3 
l.lE+0 / 1.3E+0 

4.6E-1 / 5.1E-1 


7.4E-4 / 8.4E-4 
1.8E+0 / 1.9E+0 
6.3E-1 / 3.9E+0 

l.lE-3 / 2.5E-3 
9.6E-1 / l.lE+0 

6.4E-1 / 6.7E-1 


1500 


1 ||X=°'||.-||X"||.| 

l|x"|U 

max{|o-i -<tP| : <tP > 0} 

max{|o-i| : af = 0} 
1 l|S='°'||i-||s"||i| 
l|S"||i 

max{|S??'-S°| :Spj 7^0} 
max{|Sf?'| : Sf. = 0} 


1.8E-4 / 3.6E-4 
5.9E-1 / 1.8E+0 
6.4E-13 / 1.3E-12 

1.7E-4 / 1.9E-4 
l.OE+0 / 1.2E+0 

3.6E-1 / 4.0E-1 


2.2E-4 / 2.4E-4 
9.8E-1 / l.lE+0 
6.1E-13 / l.OE-12 

1.3E-4 / 1.3E-4 
1.3E+0 / 1.4E+0 

5.3E-1 / 6.1E-1 


3.7E-4 / 6.5E-4 
1.3E+0 / 1.5E+0 
1.6E-1 / 1.6E+0 

8.1E-4 / 3.2E-3 
9.3E-1 / l.lE+0 

5.7E-1 / 6.6E-1 


6.7E-4 / 6.8E-4 
2.5E+0 / 2.8E+0 
7.3E-13 / 1.7E-12 

8.9E-4 / 9.0E-4 
l.lE+0 / 1.2E+0 

7.1E-1 / 7.9E-1 
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